feat: add support for bulk loading by mehrdad3301 · Pull Request #153 · wildcatdb/wildcat

mehrdad3301 · 2026-02-13T22:37:17Z

related to issue

What was done

Branches: master (no bulk load) vs bulk_loading (BTree BulkPutSorted in flusher + compactor).
Tool: wildcatdb/bench, fillseq only.
Configs:
- A – many small flushes: write_buffer_size=262144, num=100000
- B – medium flushes: write_buffer_size=2097152, num=100000
- C – few large flushes: write_buffer_size=8388608, num=200000
Runs: master = 2 runs per config; bulk_loading = 1 run per config.

Results

Config	Branch	fillseq ops/sec	Duration	SSTables	Note
A (256KB, 100k)	master	113036 / 108259	884ms / 923ms	3 / 4	2 runs
A (256KB, 100k)	bulk_loading	110262	906ms	22	different flush count
B (2MB, 100k)	master	102249 / 86633	978ms / 1154ms	6 / 1	2 runs, high variance
B (2MB, 100k)	bulk_loading	114565	872ms	7
C (8MB, 200k)	master	118430 / 73202	1.69s / 2.73s	3 / 2	2 runs, high variance
C (8MB, 200k)	bulk_loading	66374	3.01s	8

Summary

End-to-end fillseq does not show a clear, consistent win for bulk_loading.
SSTable counts differ a lot between branches (e.g. config A: master 3–4 vs bulk_loading 22), so we’re not comparing the same flush/compaction behavior. That makes it hard to attribute throughput differences to bulk load alone.
Config B: bulk_loading was faster in the single run; config C: bulk_loading was slower with more SSTables. High variance on master (B and C) suggests more runs would help.

Ask for help

Why might bulk_loading produce more SSTables (e.g. 22 vs 3–4 on config A)? Is there a known difference in when flushes are triggered or how compactions run on the bulk_loading branch that could explain this?
Best way to measure bulk load benefit: Would a flush-only micro-benchmark (e.g. fixed N keys, time only the B-tree build during flush on master vs bulk_loading, same N) be the right next step to isolate the effect of bulk insertion without conflating with flush count / compaction?
Any guidance on making bulk loading show a measurable benefit in real workloads (e.g. recommended write_buffer_size or workload shape), or on code paths to double-check (e.g. slice building, flush trigger conditions) would be very helpful.

Thanks in advance.

guycipher · 2026-02-13T23:05:30Z

Bulk loading with actual keys which are sorted will show in benchmarks. Bulk loading shouldnt cause more sstables, it should be the same amount of sstables, bulk loading just creates a new sstable.

feat: add support for bulk loading

193f856

guycipher merged commit 096ddc9 into wildcatdb:master Feb 13, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for bulk loading#153

feat: add support for bulk loading#153
guycipher merged 1 commit intowildcatdb:masterfrom
mehrdad3301:bulk_loading

mehrdad3301 commented Feb 13, 2026

Uh oh!

guycipher commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mehrdad3301 commented Feb 13, 2026

What was done

Results

Summary

Ask for help

Uh oh!

guycipher commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants